Version: V11

Large Language Model Node

The Large Language Model (LLM) Node generates AI-powered text responses using configurable language models from providers such as OpenAI, Anthropic, Google, Ollama, and VLLM. It supports system prompts for behavior configuration, conversation history for context-aware dialogues, and temperature settings for response creativity. The node works with both cloud-based commercial providers and self-hosted model deployments.

How It Works

The Large Language Model node connects to AI models through two deployment approaches: VIDIZMO's pre-configured cloud providers or models running on your own infrastructure. Pre-configured providers connect to commercial services (OpenAI, Anthropic, Google) through centralized credentials stored in the AI service configuration, eliminating API key management in individual workflows. Self-hosted models connect directly to AI servers on your infrastructure (Ollama, VLLM), providing full control over model deployment and data privacy while eliminating external API dependencies.

During execution, the node constructs a request containing your prompts and sends it to the selected model. For conversation-aware workflows, the node can include previous messages from conversation history, allowing the model to understand context from earlier exchanges. The model processes the request, generates a text response based on your configuration, and returns it to the workflow. The response is stored in the specified output variable, making it available for downstream nodes.

The node supports single-turn interactions where each request is independent, and multi-turn conversations where the model maintains context across exchanges. Conversation history is managed automatically when enabled, with each interaction adding to the accumulated context. This enables natural dialogue flows where the model can reference earlier parts of the conversation, though token usage increases over time as history grows.

Configuration Parameters

Output Field

Output Field (Text, Required): Workflow variable where the AI-generated response is stored.

The output is the AI-generated text response as a string. The response format depends on the Output Format parameter (Text or JSON).

Text Format Output:

The document discusses three main topics: data security, compliance requirements, and implementation timelines. Each section provides detailed guidelines for enterprise deployment.

JSON format output:

{
  "summary": "Overview of enterprise deployment guidelines",
  "key_points": [
    "Data security protocols",
    "Compliance requirements",
    "Implementation timelines"
  ],
  "sentiment": "neutral"
}

Structured format output:

{
  "summary": "Overview of enterprise deployment guidelines",
  "keywords": ["data security", "compliance", "implementation"]
}

Output conforms to the schema defined in Output Schema, ensuring consistent field names and types for reliable downstream processing.

Common naming patterns: llm_response, generated_text, ai_answer, summary, analysis

System Prompt

System Prompt (Text, Optional): Instructions that set the AI's behavior, persona, or constraints.

System prompts define how the model responds across all interactions, establishing consistent behavior patterns. Variable interpolation with ${variable_name} is supported to insert dynamic context.

Examples:

You are a helpful assistant that provides concise answers.
You are a technical writer. Explain concepts clearly with examples.
Analyze the following data and provide insights: ${context}

User Prompt

User Prompt (Text, Required): The main query or input for the LLM.

This parameter contains the actual question or task for the model to process. Variable interpolation with ${variable_name} is supported to dynamically insert data from previous workflow nodes.

Examples:

Summarize the following text: ${input_text}
What are the key points in this document?
Generate 5 product descriptions for: ${product_name}

Output Format

Output Format (Dropdown, Default: Text): Format of the LLM response.

Text (default): Plain text response for standard text generation, summaries, and answers.
JSON: Structured JSON response for data extraction and API responses.
Structured: Schema-enforced JSON response. Requires Output Schema to define the expected structure.

Output Schema

Output Schema (JSON, Conditional): JSON Schema defining the expected output structure. Available when Output Format is set to Structured.

{
  "type": "object",
  "properties": {
    "summary": {"type": "string"},
    "keywords": {"type": "array", "items": {"type": "string"}}
  },
  "required": ["summary"]
}

The schema enforces field names, types, and required properties. This ensures downstream nodes receive predictable data formats for reliable processing.

Include History

Include History (Toggle, Default: ON): Controls whether previous messages in the conversation are included for context.

When enabled, the LLM receives the full conversation history, enabling context-aware multi-turn dialogues where the model can reference earlier exchanges. When disabled, only the current message is processed without historical context, treating each request as independent.

Persist Message

Persist Message (Toggle, Default: ON): Controls whether the current interaction is saved to conversation history.

When enabled, the message is stored for future requests, building up context over time. When disabled, the message is processed but not stored in history, which is useful for one-off queries that should not affect conversation context.

Use Self-Hosted Model

Use Self-Hosted Model (Toggle, Default: OFF): Controls whether to use pre-configured models or provide custom model configuration.

Mode	Configuration	Use for
OFF (default)	Models configured in AI service settings with centralized credentials	Commercial providers (OpenAI, Anthropic, Google) with VIDIZMO-managed authentication
ON	Requires Base URL, API Key, and other configuration in this node	Self-hosted models (Ollama, VLLM) or custom endpoints with direct authentication

Model Provider

Model Provider (Dropdown, Default: Ollama): The AI provider that hosts the model.

Provider	Example models	Authentication
OpenAI	GPT-4, GPT-3.5-turbo, GPT-4-turbo	API Key
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku	API Key
Google	Gemini Pro, Gemini 1.5 Pro	API Key
Ollama	Llama 3, Mistral, Phi 3, custom models	Base URL
VLLM	Custom models	Base URL

Model Name

Model Name (Text, Required): The specific model identifier to use.

Different models have different capabilities, context windows, and pricing. Variable interpolation with ${variable_name} is supported for dynamic model selection based on workflow logic.

Examples: gpt-4, claude-3.5-sonnet, llama3, gemini-pro, mistral

Base URL

Base URL (Text, Conditional): The base URL endpoint for the model provider API.

Required when Model Provider is Ollama or VLLM. This is the base URL where the self-hosted model server is running. Variable interpolation with ${variable_name} is supported for dynamic endpoint selection.

Examples: http://localhost:11434, http://192.168.1.100:8000

API Key

API Key (Text, Conditional): The API key for authenticating with commercial model providers.

Required when Model Provider is OpenAI, Anthropic, or Google. The key authenticates requests and tracks usage for billing. Variable interpolation with ${variable_name} is supported to load keys from secure workflow variables.

Variables such as ${api_key} from secure sources are recommended instead of hardcoding keys in the workflow.

Temperature

Temperature (Number, Default: 0.7): Controls the randomness of model responses (0–2).

Lower values produce more deterministic and focused outputs; higher values produce more creative and varied outputs. Variable interpolation with ${variable_name} is supported for dynamic temperature adjustment.

Range	Behavior	Use for
0–0.3	Deterministic, factual, consistent	Summarization, data extraction, factual Q&A, classification
0.4–0.7 (default)	Balanced creativity and consistency	General-purpose tasks, conversational AI
0.8–2.0	Creative, varied, exploratory	Content generation, brainstorming, creative writing, ideation

Max Token Limit

Max Token Limit (Number, Default: 16000): Maximum number of tokens the model can process in a single request.

This includes both input (prompts and conversation history) and output (generated response) tokens. Variable interpolation with ${variable_name} is supported.

Common model limits:

GPT-4: 8,192 tokens
GPT-4-turbo: 128,000 tokens
Claude 3: 200,000 tokens
Llama 3.2: 128,000 tokens
Gemini 1.5 Pro: 1,000,000 tokens

Note: Token limits are subject to change as model capabilities evolve. Refer to the official provider documentation for the most current limits.

Lower token limits reduce costs and latency. Higher limits support longer conversations and documents.

Reasoning

Reasoning (Toggle, Default: OFF): Controls reasoning/thinking mode for supported models.

When enabled, the model uses extended reasoning processes before generating responses, producing more thoughtful and analytical outputs. When disabled, the model generates responses using standard processing. Variable interpolation with ${variable_name} is supported for dynamic control. This feature is available for models that support reasoning mode, such as Ollama thinking models.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Logging Mode, and Wait For All Edges. For detailed information, see Common Parameters.

Best Practices

Use system prompts to establish consistent AI behavior across multiple requests in a workflow
Variable interpolation creates dynamic prompts that adapt based on upstream node outputs
Use lower temperature values (0–0.3) for factual tasks and higher values for creative tasks
Set token limits to match model capabilities to avoid truncation
Enable conversation history for multi-turn dialogues where context matters, but monitor token usage as history accumulates
Self-hosted models (Ollama, VLLM) provide data privacy and can reduce API costs
Store API keys in secure workflow variables rather than hardcoding them

Limitations

Model Availability: Pre-configured models require VIDIZMO AI service configuration. Self-hosted models require running instances.
Token Limits: Each model has maximum token limits. Exceeding limits results in truncated responses or errors.
API Rate Limits: Commercial providers (OpenAI, Anthropic, Google) enforce rate limits based on subscription tier.
Conversation History: History accumulates tokens over time. Long conversations may exceed model context windows.

How It Works​

Configuration Parameters​

Output Field​

System Prompt​

User Prompt​

Output Format​

Output Schema​

Include History​

Persist Message​

Use Self-Hosted Model​

Model Provider​

Model Name​

Base URL​

API Key​

Temperature​

Max Token Limit​

Reasoning​

Common Parameters​

Best Practices​

Limitations​

Related Articles​